Open Source For Geeks: Java Interview Questions

Showing posts with label Java Interview Questions. Show all posts

Sunday, 27 January 2019

What is transient keyword in Java?

Background

In one of my earlier post, I had covered what serialization in Java is and how transient keyword can be used in that context. You can take a look at that post -

Serialization in Java

We saw that in serialization context, instance variables marked as transient will not be serialized and on deserialization, they get the default values. In this post, we will see some more details about transient keyword with an example.

Transient use cases

The transient keyword should be used when you do not want to serialize your instance variables. Eg.

An instance of Logger class. There is no state associated with a logger instance, so we do not need to serialize it.
Similarly, any secure data like passwords, which you may not want to serialize.

You cannot use any classes in the JDK that does not implement Serializable as references in your class that is Serializable. It needs to be marked transient. Else it will throw “java.io.NotSerializableException” exception.

The transient and final keyword

Transient treats final keyword a bit differently. So let's take an example to understand this,

//final field 1
public final transient String myName = "Aniket";
//final field 2
public final transient Logger myLogger = LoggerFactory.getLogger(MyClass.class.getName());

If you consider above fields in a class, by our above logic they should not be serialized since they are marked as transient. However, if any final variable is evaluated as a "constant expression" like in case of myName above, it will be serialized. So in above case myName is serialized and myLogger is not.

On the side note, recall serialVersionUID which is static and final. It is the only static variable that gets serialized. static instances do not form the state of the object, so by very definition of serialization, it is not serialized.

Use of transient keyword in HashMap.

If you see HashMap implementation you can see the array that backs it is marked as transient.

      /**
      * The table, resized as necessary. Length MUST Always be a power of two.
  */
      transient Entry[] table;

If this array is not serialized, how does it get deserialized back to restore instance state? Class does implement serializable -

public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {


    private static final long serialVersionUID = 362498820763181265L;

Reason for this is that two instances of same class do not generate the same hashcode (unless of course, you override the hashcode method to do so). The native implementation uses the objects memory location which will be different for the object before serialization and after deserialization. So it is not guaranteed that the objects will be in the same bucket and in the same location as that of hashmap that was serialized. This is why the array itself is not serialized by marking it as transient. So how is the instance state restored?

For this in HashMap implementation, they override writeObject and readObject method. In the writeObject method, all entries from the entry array are read in a sequence and serialized. Similarly, in readObject for deserialization, it is read in the same order and then stored in its own internal entry table array. So the buckets and position are dynamically calculated during deserialization with the same data.

PS: This is a good interview question :)

Saturday, 17 June 2017

Counting Semaphore, CountDownLatch, CyclicBarrier - synchronization methods for concurrency

Background

With Java 5 a lot of concurrency mechanisms were introduced for synchronization.

In one of the previous posts, we saw what Reentrant locks are and how they help us achieve concurrency. -

ReentrantLock in Java

We also saw

Creating Threads with the ExecutorService

Along with ReentrantLock and ExecutorService there were other concurrency elements that were introduced in Java 5 like -

Counting Semaphore

CountDownLatch

CyclicBarrier

Today we will try to understand these. Not only their understanding helps us with multithreading they are also a popular topic of Java interview question. Let's see them one by one.

Counting Semaphore

Semaphore maintains a number of permits for a resource and only that many number of threads can access the resource. If the maximum permits allowed is reached then threads will have to wait till some other thread owing a permit releases it.

As an example lets consider a simple semaphore with 1 permit. It's called binary semaphore. It's similar to wait and notify on same object.

    public static void main(String args[])
    {
        Semaphore binarySemaphore = new Semaphore(1);
        new Thread(() -> {
            try {
                binarySemaphore.acquire();
                System.out.println("Semaphore permit acquired by : " + Thread.currentThread().getName());
                
            } catch (Exception e) {
                e.printStackTrace();
            }
            finally {
                System.out.println("Semaphore permit getting released by : " + Thread.currentThread().getName());
                binarySemaphore.release();            }
            
        }).start();
        new Thread(() -> {
            try {
                binarySemaphore.acquire();
                System.out.println("Semaphore permit acquired by : " + Thread.currentThread().getName());
                
            } catch (Exception e) {
                e.printStackTrace();
            }
            finally {
                System.out.println("Semaphore permit getting released by : " + Thread.currentThread().getName());
                binarySemaphore.release();
            }
            
        }).start();
    }

and the output would be -

Semaphore permit acquired by : Thread-0
Semaphore permit getting released by : Thread-0
Semaphore permit acquired by : Thread-1
Semaphore permit getting released by : Thread-1

As you can see from code above you acquire a permit using acquire() method and release a permit using release() method.

NOTES :

You can also acquire permit using acquireUninterruptibly(). This is a blocking call and the thread cannot be interrupted.
Now acquire() is also a blocking call however it can be interrupted unlike acquireUninterruptibly() call
You can also use tryAcquire() call which will try to acquire the permit and if available will return immediately with true. If it is not available it will also return immediately with false. So this is a non blocking call.

CountDownLatch

This is another synchronization mechanism in which a resource is not allowed access till predefined number of threads don't complete their operations. So lets say there are 10 threads making a bread slice. As soon as we are ready with 5 slices we can lets say pack it together for selling. In this case we can use a CountDownLatch. Initialize one with 5 and as soon as 5 threads acknowledge they have finished making slices we can start packing (probably a new thread).

So a thread will wait for n other threads. Let's see an example -

    public static void main(String args[])
    {
        CountDownLatch countDownLatch = new CountDownLatch(2);
        new Thread(() -> {
            try {
                Thread.sleep(2000);
                System.out.println("Calling countdown by : " + Thread.currentThread().getName());
                countDownLatch.countDown();
            } catch (Exception e) {
                e.printStackTrace();
            }
           
        }).start();
        new Thread(() -> {
            try {
                Thread.sleep(2000);
                System.out.println("Calling countdown by : " + Thread.currentThread().getName());
                countDownLatch.countDown();
            } catch (Exception e) {
                e.printStackTrace();
            }
           
        }).start();
       
        try {
            System.out.println("Waiting for all other threads finish operation");
            countDownLatch.await();
            System.out.println("All other threads finish operation!");
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

Output :
Waiting for all other threads finish operation
Calling countdown by : Thread-1
Calling countdown by : Thread-0
All other threads finish operation!

As you can see main thread calls await() on the countdownlatch and wait for 2 threads to call countDown() on it. Here n is 2 but you can configure it in the constructor.

You need to use this when your use case is to wait for some other initial operations to finish before starting some other operation.

NOTE :

CountDownLatch is not reusable. So once the count reaches 0 i.e n threads have called countdown() the latch is unusable.

CyclicBarrier

CyclicBarrier is yet another synchronization mechanism. In this, all n threads will wait for each other to reach the barrier. Such waiting threads are called parties. The number of parties are set in the CyclicBarrier during its creation. All parties reach the barrier and call await() which is a blocking call. Once all parties reach the barrier i.e all call await() then all threads get unblocked and proceed for next execution.

Simple use case that you can think of is a multiple game scenario in which a game would not start until all the players have joined. Here all the players are parties whereas game start is a barrier.

Eg -

    public static void main(String args[])
    {
        CyclicBarrier cyclicBarrier = new CyclicBarrier(3);
        new Thread(() -> {
            try {
                Thread.sleep(2000);
                System.out.println("Player joining : " + Thread.currentThread().getName());
                cyclicBarrier.await();
                System.out.println("Game starting from : " + Thread.currentThread().getName());
            } catch (Exception e) {
                e.printStackTrace();
            }
           
        }).start();
        new Thread(() -> {
            try {
                Thread.sleep(2000);
                System.out.println("Player joining : " + Thread.currentThread().getName());
                cyclicBarrier.await();
                System.out.println("Game starting from : " + Thread.currentThread().getName());
            } catch (Exception e) {
                e.printStackTrace();
            }
           
        }).start();
        new Thread(() -> {
            try {
                Thread.sleep(2000);
                System.out.println("Player joining : " + Thread.currentThread().getName());
                cyclicBarrier.await();
                System.out.println("Game starting from : " + Thread.currentThread().getName());
            } catch (Exception e) {
                e.printStackTrace();
            }
           
        }).start();
       
    }

and the output is -

Player joining : Thread-0
Player joining : Thread-1
Player joining : Thread-2
Game starting from : Thread-2
Game starting from : Thread-0
Game starting from : Thread-1

As you can see all threads (3 in above case) will wait for each other to reach the barrier. Once they all reach and call await() they can all proceed to their further tasks.

NOTE :

cyclicBarrier.reset() will put barrier on its initial state, other thread which is waiting or not yet reached barrier will terminate with java.util.concurrent.BrokenBarrierException. So CyclicBarrier can be reused unlike CountDownLatch.

Both CyclicBarrier and CountDownLatch are used to implement a scenario where one Thread waits for one or more Thread to complete there job before starts processing but there is one Difference between CountDownLatch and CyclicBarrier in Java which separates them apart and that is, you can not reuse same CountDownLatch instance once count reaches to zero and latch is open, on the other hand CyclicBarrier can be reused by resetting Barrier, Once barrier is broken.
One major difference is that CyclicBarrier takes an (optional) Runnable task which is run once the common barrier condition is met.
It also allows you to get the number of clients waiting at the barrier and the number required to trigger the barrier. Once triggered the barrier is reset and can be used again.

Summary

Semaphore : Manages a fixed sized pool of resources.

CountDownLatch : One or more threads wait for a set of threads to finish operations.

CyclicBarrier : Set of threads wait for each other until they reach a specific point.

Saturday, 20 May 2017

How ConcurrentHashMap Works Internally in Java

Background

In one of the previous posts we saw how HashMap works -

Understanding how hashing (HashMap and HashSet) work in Java

and how it's time complexity of insertion and deletion is O(1) is normal case. Though this is a great data structure to work with in terms of time complexity it is not thread safe which means you cannot use it directly in multi threaded environments without taking additional precautions like synchronizing put/get on your own. Instead Java has provided a thread safe implementation of concurrent hashmap. We can directly use it in case of multi threaded environments for thread safety. Eg. in case of parallel stream introduced in java 8.

How ConcurrentHashMap Works Internally in Java

Before we see how it is implemented in Java lets give it some though. What are possible problems with a HashMap. Race condition, invalid state. Lets say two writes happen at the same time. Since write is not an atomic operation one value may overwrite other and Map may go in inconsistent state. We can obviously add synchronization over read/writes of a HashMap but it would be very inefficient and have performance impact. I would be like single threaded application certainly the behavior we don't expect. To solve this issue Java provides ConcurrentHashMap that has built in thread safety. Let see how -

We know how HashMap works. Internally it stores an array of Entry object which essentially has key, value and pointer to next Entry object (linked list used in case of collision). You can think of each array element as bucket and each Entry object as a data point containing key (in case 2 keys have same hash - collision), value and pointer to next data element.

Working :

ConcurrentHashMap as the name suggests allows concurrent read/writes to the Map. But there are limitations. ConcurrentHashMap maintains another data structure internally called segments. Each bucket of HashMap is part of one of the segments. Number of segments is called Concurrency-Level which determines number of thread that can write simultaneous. This Segments gets locked when writing/updating/removing data. Think of Segments as locks used to prevent concurrent write to same bucket of hashmap leading to inconsistency. So as long as write to concurrent hashmap is on different segments it can happen in parallel. Reads are completely lock free i.e No need to acquire lock for reading. Last updated value is returned.

Now lets go step by step -

Concurrency-Level , Segment array and initialization :

First when you create a ConcurrentHashMap you can provide concurrency level. This determines size of Segment array. Size of segment array will always be equal or more than the concurrency level. If this is not provided default is used -

static final int MAX_SEGMENTS = 1 << 16; // slightly conservative

Note that the size of segment table will always be power of 2. So if you give concurrency level as 10 then next best power of 2 match will be picked up i.e 16 and Segment array of size 16 will be created which implies 16 threads can simultaneously operate on the map.

static final class Segment<K,V> extends ReentrantLock implements Serializable {

    //The number of elements in this segment's region.
    transient volatile int count;
    //The per-segment table. 
    transient volatile HashEntry<K,V>[] table;
}

Putting element in ConcurrentHashMap :

For putting element in Map we first need to determine which segment the element should be processed for. For this we first get hascode of the key. Next we do a rehash of the existing hash to ensure

     /**
     * Applies a supplemental hash function to a given hashCode, which
     * defends against poor quality hash functions.  This is critical
     * because ConcurrentHashMap uses power-of-two length hash tables,
     * that otherwise encounter collisions for hashCodes that do not
     * differ in lower or upper bits.
     */
    private static int hash(int h) {
        // Spread bits to regularize both segment and index locations,
        // using variant of single-word Wang/Jenkins hash.
        h += (h <<  15) ^ 0xffffcd7d;
        h ^= (h >>> 10);
        h += (h <<   3);
        h ^= (h >>>  6);
        h += (h <<   2) + (h << 14);
        return h ^ (h >>> 16);
    }

Once hash is calculated you can get the segment which it belongs to and delegate put method to segments put method as follows -

    public V put(K key, V value) {
        if (value == null)
            throw new NullPointerException();
        int hash = hash(key.hashCode());
        return segmentFor(hash).put(key, hash, value, false);
    }

    final Segment<K,V> segmentFor(int hash) {
        return segments[(hash >>> segmentShift) & segmentMask];
    }

We will see how segment is computed in some time with a proper example. Once put is delegated to segment , segment will add it to the appropriate bucket in the segment.

        V put(K key, int hash, V value, boolean onlyIfAbsent) {
            lock();
            try {
                int c = count;
                if (c++ > threshold) // ensure capacity
                    rehash();
                HashEntry<K,V>[] tab = table;
                int index = hash & (tab.length - 1);
                HashEntry<K,V> first = tab[index];
                HashEntry<K,V> e = first;
                while (e != null && (e.hash != hash || !key.equals(e.key)))
                    e = e.next;

                V oldValue;
                if (e != null) {
                    oldValue = e.value;
                    if (!onlyIfAbsent)
                        e.value = value;
                }
                else {
                    oldValue = null;
                    ++modCount;
                    tab[index] = new HashEntry<K,V>(key, hash, first, value);
                    count = c; // write-volatile
                }
                return oldValue;
            } finally {
                unlock();
            }
        }

Now this is very interesting method. Lets understand whats happening here.

First call is to lock(). Since it is a write/update operation on a bucket of same segment we need a lock. If you recollect Segment class it extends ReentrantLock so each segment is a lock. So you can call lock() and unlock() directly in Segment class.
Next it's like a normal HashMap. You find the index of the Entry table where your elements hash falls and add it there as linked list.
You can see similar code as HashMap that updates value if key is same, inserts in array if there is no element in the table and adds it in the linked list of the table if element already exists.
Finally once operation is complete it calls unlock() so that other threads can continue update.
Note the lock is a blocking call.
You can also see call for rehash if threshold is reached. Like Entry array Segment also has a threshold and when it is reached Segment array is resized for performance. That's what rehash.

NOTE : For getting index of Segment table first n bits are used where as for getting index of Entry table last N bits are used from enhanced hash integer (See details in example below).

Getting element from ConcurrentHashMap :

Get on ConcurrentHashMap is very simple no locks involved. You simply read the data and return -

        public V get(Object key) {
                int hash = hash(key.hashCode());
                return segmentFor(hash).get(key, hash);
        }

        V get(Object key, int hash) {
            if (count != 0) { // read-volatile
                HashEntry<K,V> e = getFirst(hash);
                while (e != null) {
                    if (e.hash == hash && key.equals(e.key)) {
                        V v = e.value;
                        if (v != null)
                            return v;
                        return readValueUnderLock(e); // recheck
                    }
                    e = e.next;
                }
            }
            return null;
        }

NOTE : readValueUnderLock method is used as a backup in case a null (pre-initialized) value is ever seen in an unsynchronized access method.

Example

Above was just all code and some understanding. Now lets take an actual example.

Let's say we have created a ConcurrentHashMap with concurrency level lets say 10. Based on this Segment array will be created based on following code -

    private static void printSegmentDetails(int concurrencyLevel) {
        int sshift = 0;
        int segmentMask = 0;
        int segmentShift = 0;

        int ssize = 1;
        while (ssize < concurrencyLevel) {
            ++sshift;
            ssize <<= 1;
        }
        segmentShift = 32 - sshift;
        segmentMask = ssize - 1;
        System.out.println("Segment array size :" + ssize);
        System.out.println("segmentShift : " + segmentShift);
        System.out.println("segmentMask : " + segmentMask);
    }

Output for 10 concurrency level:
Segment array size : 16
segmentShift : 28
segmentMask : 15

NOTE :As mentioned before segment array is of size 2^n such that 2^n >= concurrency level. In this case 2^4

Now that we have segment table in place lets simulate put. We need to put a String called "Aniket" as key. We don't care about value. Just make sure it's not null.

First we will calculate hascode of the key.
Then hash it so for better hash (as mentioned above)
Then based on the result hash we will find which segment will it belong

Remember of Segment table was >= 2^N we now want first N bits to determine which segment this hash falls into. Since N bits will vary from 1 - 2^N which is our segment array size. Also remember code to get this index from above? -

int segmentIndex = (hash >>> segmentShift) & segmentMask

This essentially means logically right shift hash with segmentShift bits. Since int is 32 bit and segmentShift = 32 - sshift, hash >>> segmentShift will essentially give you first sshift bits (sshift is nothing but N in 2^N we saw above). segmentMask is to get the N bits post shift.

So in this case,
N = sshift = 4
2^N = 16 -> Size of segment array
segmentShift = 32 - 4 = 28 (as we saw in output above)
segmentMask = 16 -1 - 15

    public static void main(String args[]) {    
        String key = "Aniket";
        //hascode of key
        System.out.println(key.hashCode());
        //better hash
        System.out.println(hash(key.hashCode()));
        //better hash in binary
        System.out.println(Integer.toBinaryString(hash(key.hashCode())));
        //logical right shift by segmentShift
        System.out.println("Right shifter hash : " + Integer.toBinaryString(hash(key.hashCode()) >>> 28));
        // segment index as binary and of right shift and segmentMask
        System.out.println("Segment Index : " + Integer.toBinaryString((hash(key.hashCode()) >>> 28 ) & 15));
        // segment index as decimal
        System.out.println("Segment Index : " + ((hash(key.hashCode()) >>> 28 ) & 15));
    }

Output :
1965716254
1839402854
1101101101000110000111101100110
Right shifter hash : 110
Segment Index : 110
Segment Index : 6

NOTE : 1101101101000110000111101100110 is 31 bits as rightmost bit is 0 and ignored. Same goes for all subsequent binmary bit formats.

So your element with key "Aniket" will go in Segment array of index 6. Inside segments it's pretty simple to calculate index of Entry array.

int entryArrayindex = hash & (tab.length - 1);

         int entryArrayindex = (hash(key.hashCode()) & (16 - 1));
         System.out.println("Entry array index : " + entryArrayindex);
         System.out.println("Entry array index in binary : " + Integer.toBinaryString(entryArrayindex));

Output :
Entry array index : 6
Entry array index in binary : 110

So finally Entry is inserted at index 6 of Entry table.

So to summarize for getting index of Segment table first n bits are used where as for getting index of Entry table last N bits are used from enhanced hash integer.

Diagram source : http://javabypatel.blogspot.in/2016/09/concurrenthashmap-interview-questions.html

Sunday, 14 May 2017

Difference between ClassNotFoundException vs NoClassDefFoundError in Java

Background

In last post we saw how classloading works in Java -

How classloader works in Java

In this post we will try to understand the difference between ClassNotFoundException vs NoClassDefFoundError thet generally bugs all Java developers.

If you have not gone through above link I strongly suggest you do it right away. It will give you very good understanding on class loading mechanism that we will be using shortly to understand these two situations.

Difference between ClassNotFoundException vs NoClassDefFoundError in Java

First point to note is their types. ClassNotFoundException is a checked exception. So you will need to handle it. Either catch it or throw it in method signature. NoClassDefFoundError is an error. Error are generally something you cannot recover from. You can still catch and handle it though.
Both things are related to class not available during runtime. Difference is the cause of non availability which we will see shortly.
ClassNotFoundException is throw by running application where as NoClassDefFoundError is thrown by Java runtime.

ClassNotFoundException :

We have already seen an example of ClassNotFoundException in last post when we tried to load our custom class with Extension classloader which was parent of Application classloader that actually loaded the class. We said parent classloader does not have visibility of classes loaded by child class loaders. So it threw ClassNotFoundException. Generally ClassNotFoundException is thrown when you try to load a custom class using methods like -

Class.forName()
ClassLoader.loadClass()
ClassLoader.findSystemClass()

and the required class is not found in the classpath. This could be because your classpath is incorrectly configured. Famous example is when you connect to a database using Jave you load the driver class using - Class.forName() [We do this explicit loading pre Java 6. From java 6 this class loading happens automatically. All you have to do is make sure driver jar is present in classpath] -

So for above usecase if you don have a driver class in your classpath then it will led to ClassNotFoundException. Also you must have noticed by not you need to explicitly handle this exception since this is checked exception.

To resolve this issue you need to check that the class is available in your classpath.

NoClassDefFoundError:

Now this unlike ClassNotFoundException is an Error which is hard to recover from. This generally happens when class is available at compile time but not available at runtime.

One example can be static method/block of a class throws error due to which class does not load (though it was available at compile time and went through in compilation phase). Now if such a class is reference at runtime then it will throw NoClassDefFoundError.

To resolve this error you need check your classpath. It is possible that in your configuration (say in gradle files) you have added jar is lets say test configuration only and not in runtime or compile time configuration. You also need to lookout for any Initialization errors in the logs.

How classloader works in Java

Background

We know how Java works. We create Java classes, create instances of it and they interact with each other. In this post we will see how classloaders work. We know javac is a compiler that converts human understandable Java code to class files containing bytecodes that JVM interpreted (java) understands. Classloaders are responsible for loading these classes at runtime. This is one of the good interview questions that is asked to experienced Java developers. This should also help you understand difference between NoClassDefFoundError and java.lang.ClassNotFoundException, So lets get to it.

Basic points

We will come to details of these but to begin with note down these points -

Delegation - Each classloader first delegates loading of class to it's parent (goes all the way up the hierarchy). If parent is not able to load the class then class is tried to be loaded by it's child. If it cannot be loaded by any of the classloaders ClassNotFoundException exception is throws.
Visibility - Each classloader knows about the classes that it's parents have loaded. However it does not work the other way around. Parents will not know the classes loaded by their child. This brings us to the 3rd points.
Uniqueness - Each class is loaded exactly once. Since each child delegates class loading to it's parent and know the classes it's parents have loaded, it will try to load classes only when it is not loaded by its parent.

Now these are ofcource default behavior of classloaders that already exist. However you can write your own class loaders and break it (not recommended though).

Classloading in Java

Java has 3 main classloaders that are used to load classes at runtime -

Bootstrap ClassLoader (Also called Primordial classLoader)
Extension ClassLoader
Application ClassLoader

In that order. So Bootstrap is parent of Extension and Extension is parent of Application classloader. Each of these classlaoders load classes from a predefined location

Above diagram says it all but let me reiterate -

Bootstrap ClassLoader is the topmost level classloader. It does not have any parent. This classloader is a native implementation . This class loader is responsible of loading all standard JDK classes. It does this from path - <JRE>/lib/rt.jar. Since this is native implementation it does not refer to ClassLoader class.
Extension ClassLoader is direct child of Bootstrap classLoader. When this classloader tries to load a class it first delegates it to it's parent - Bootstrap ClassLoader. If parent is unsuccessful then Extension ClassLoader will try to load classes from path <JRE>/lib/ext or from path specified in java.ext.dirs system variable. In JVM this is implemented by - sun.misc.Launcher$ExtClassLoader
Application classloader is child of Extension classloader. Execution sequence remains same. When a class is loaded from this classloader it delegates to it's parent Extension which in turn delegates it to it's parent Bootstrap. If parents are unsuccessful in loading classes then Application classloaded will try to load class from the classpath - you can give it with arguments -classpath or -cp or specify it in manifest file of jar. In JVM this is implemented by sun.misc.Launcher$AppClassLoader

If application classloader is not able to load the class then it throws ClassNotFoundException. When JVM loads this is the order in which classloaders execute and load classes.

Code Demo

Let's try to understand few things with code now. First thing we discussed is Bootstrap classloader and how it's the topmost classloader with native implementation and that it does not have any parent. However you cannot refer to Bootstrap classloader in Java. It will give null - Since it is native implementation.

    public static void main(String args[]) throws InterruptedException, IOException {
        ClassLoader classLoader  = String.class.getClassLoader();
        System.out.println(classLoader==null?"Bootstrap classloader not available from Java":"Bootstrap classloader available from Java");
    }

and you should get -

Bootstrap classloader not available from Java

Now lets try to check our visibility principle. By that parent classes should not be able to load classes loaded by their child. We are going to create a new class called HelloWorld and from it check which classloader loaded it (from out previous knowledge all classpath classes are loaded by Application classloader) and well see it's parent (should be Extension classloader) and finally try to load the same HelloWorld class using parent classloader (should fail as parent should not have visibility to classes loaded by child) -

    public static void main(String args[]) throws InterruptedException, IOException {
        ClassLoader classLoader  = HelloWorld.class.getClassLoader();
        System.out.println("Current classloader : " + classLoader);
        System.out.println("Current classloaders parent : " + classLoader.getParent());
        try {
            Class.forName("HelloWorld", true, HelloWorld.class.getClassLoader().getParent());
        } catch (ClassNotFoundException e) {
            System.out.println("Class could not be loaded by the classloader");
            e.printStackTrace();
        }
    }

Output is -
Current classloader : sun.misc.Launcher$AppClassLoader@4554617c
Current classloaders parent : Current classloaders parent : sun.misc.Launcher$ExtClassLoader@677327b6
Class could not be loaded by the classloader
java.lang.ClassNotFoundException: HelloWorld
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at HelloWorld.main(HelloWorld.java:93)

You can also see for yourself the way classes are loaded . Just use option -verbose:class while running Java. Example in screenshot below -

NOTE :

JVM maintains a runtime pool is permgen area where classes are loaded. Whenever a class is referenced default class loader finds the class is the class path and loads it into this pool. And this is not specific to user defined classes or classes provided in JDK. When a class id referenced it is loaded into the memory.
Yes and ClassLoader instance does not get GCed as it is referenced by JVM thread. Infact that is why even if you have a Singleton class it is possible to create two instances with two different class loaders.
No ClassLoader instances are same as any other Objects in the heap. The statement that it does not get GCed come from the fact that ClassLoaders have references from JVM threads which run till the java process is completed and JVM shuts down. For eg the Bootstrap Class Loader is a native implementation meaning its code is embedded in JVM. So it's reference will always be alive. Hence we say they are not the potential candidates for GC. Other than that GC treats them the same way.

Sunday, 5 June 2016

Implementing blocking queue in Java

Blocking Queue

Blocking queue is a queue that has a limit of elements it can hold and once that limit has reached enqueuing thread needs to wait for some thread to dequeue elements and make space. Similarly if queue becomes empty then dequeuing thread has to wait until some thread enqueues elements in it.

Diagrammatically it is as follows -

Lets see how can we implement it in Java.

Blocking queue implementation in Java

Code is as follows -

package com.osfg.models;

import java.util.LinkedList;
import java.util.Queue;

/**
 * 
 * @author athakur
 * Model class for blocking queue
 */
public class BlockingQueue<E> {
    
    private Queue<E> bQueue = new LinkedList<E>();
    private int maxQueueSize; 
    
    public BlockingQueue(int maxQueueSize) {
        this.maxQueueSize = maxQueueSize; 
    }
    
    public synchronized void enqueue(E e) throws InterruptedException {
        
        while(bQueue.size() == maxQueueSize) {
            wait();
        }
        bQueue.add(e);
        // notify if any thread is waiting to dequeue as data is now available
        notifyAll();
    }
    
    public synchronized E dequeue() throws InterruptedException{
        
        while(bQueue.size() == 0) {
            wait();
        }
        E e = bQueue.remove();
        notifyAll();
        return e;
        
    }

}

Notice how we are using wait() and notifyAll() calls. Also observer both enqueue and dequeue methods are synchronized. So at a time only one thread can execute them. You can also find this code in my data structure github repository. Also see the Thread Pool implementation in Java which uses the blocking queue internally.

Producer - Consumer design can be built using blocking queue. Producers enqueue jobs in the queue and wait when the queue is full where as consumers dequeue jobs in the queue and wait when the queue is empty. Famous example of producer consumer design in Thread pool implementation where you can enqueue your tasks in the queue and threads from Threadpool will dequeue and process it. As mentioned before you can see the code for blocking queue and thread pool in my github repository (Links in Related Links section below).

Saturday, 4 June 2016

Simple program to create deadlock between two threads and it's fix

Background

This is one of the very basic Java multithreading question - to write a simple java program to demonstrate a deadlock. So in this post I will provide code to demonstrate that -

Java code to create deadlock between two threads

Code is as follows -

/**
 * 
 * @author athakur
 * Simple deadlock program
 */
public class Deadlock {
    
    public static void main(String args[]) {
        
        final Object resourceOne = "res1";
        final Object resourceTwo = "res2";
        
        new Thread(new Runnable() {
            
            @Override
            public void run() {
                
                synchronized(resourceOne) {
                    System.out.println(Thread.currentThread().getName() + " accquired resource 1");
                    try {
                        Thread.sleep(2000);
                    } catch (InterruptedException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    }
                    synchronized(resourceTwo) {
                        System.out.println(Thread.currentThread().getName() + " accquired resource 2");
                    }
                }
            }
        }).start();
        
        new Thread(new Runnable() {
            
            @Override
            public void run() {
                synchronized(resourceTwo) {
                    try {
                        Thread.sleep(2000);
                    } catch (InterruptedException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    }
                    System.out.println(Thread.currentThread().getName() + " accquired resource 2");
                    synchronized(resourceOne) {
                        System.out.println(Thread.currentThread().getName() + " accquired resource 1");
                    }
                }
            }
        }).start();
        
    }

}

Copy above code in Deadloc.java file, compile and run it. You should see following output in console -

Thread-0 accquired resource 1
Thread-1 accquired resource 2

And the program should hang.

Explanation

Thread 1 acquires lock over resource 1 and goes to sleep for 2 seconds. After sleep it would try to acquire lock on resource 2. When thread 1 was sleeping, thread 2 acquired lock over resource 2 and went to sleep for 2 seconds. After 2 seconds thread 2 would try to acquired lock on resource 1. Now it does not matter which thread wakes up. It will not be able to get lock on inner resource as other thread has acquired it and waiting for other to release. This will led to deadlock.

How to resolve this deadlock?

You can use Reentrant locks introduced in Java 1.5 to check if lock is available before locking it or you can do a timed lock (If lock is not available in x time move on).

Another way to resolve this is to always have a fixed order to acquire lock. So both T1 and T2 threads will always lock resource 1 before trying to acquire lock on resource 2. This breaks the cycle that can lead to deadlock.

There is more more stupid way to prevent deadlock. When you are spawning new threads from main thread call join from main thread on all new threads before calling start() on next thread. Stupid? I know. It is same as single threaded application as you are executing threads sequentially.

Couple of tips that may come handy -

don't use multiple threads (like Swing does, for example, by mandating that everything is done in the EDT)
don't hold several locks at once. If you do, always acquire the locks in the same order
don't execute foreign code while holding a lock
use interruptible locks

Thursday, 28 April 2016

Remove repetitive words from a String and return it

Question

Assume you are given a string, write a function that can remove consecutive repeated words in the string and return it. For example,

"Beginning with the first first first first manned Gemini mission in March 1965, commemorative commemorative medallions were prepared for the astronauts at their request. It is unclear who prepared these early medallions only only, only that each individual box containing a medallion bore the word Fliteline."

to

"Beginning with the first manned Gemini mission in March 1965, commemorative medallions were prepared for the astronauts at their request. It is unclear who prepared these early medallions only, only that each individual box containing a medallion bore the word Fliteline."

Make sure you preserving punctuations in the right positions.

Solution

I have written following Java code with various methods that solve variations of above question -

import java.util.HashSet;
import java.util.Set;
import java.util.Stack;
import java.util.StringTokenizer;

/**
 * 
 * @author athakur
 *
 */
public class RemoveDuplicates {

    public static void main(String args[]) {

        String input = "Beginning with the first first first first manned Gemini mission in March 1965, commemorative commemorative medallions were prepared for the astronauts at their request. It is unclear who prepared these early medallions only only, only that each individual box containing a medallion bore the word Fliteline.";

        System.out.println(removeConsecutiveDuplicates(input));
        System.out.println(removeConsecutiveDuplicatesBetter(input));
        System.out.println(removeAllDuplicates(input));

    }

    public static String removeAllDuplicates(String input) {
        StringBuilder outputBuilder = new StringBuilder();
        Set<String> wordsSet = new HashSet<>();
        StringTokenizer tokenizer = new StringTokenizer(input, " ");
        while (tokenizer.hasMoreTokens()) {
            String word = tokenizer.nextToken();
            if (wordsSet.add(word)) {
                outputBuilder.append(word);
                outputBuilder.append(" ");
            }
        }
        return outputBuilder.toString();
    }

    public static String removeConsecutiveDuplicatesBetter(String input) {

        StringBuilder outputBuilder = new StringBuilder();
        String lastWord = "";
        StringTokenizer tokenizer = new StringTokenizer(input, " ");
        while (tokenizer.hasMoreTokens()) {
            String word = tokenizer.nextToken();
            if (!word.equalsIgnoreCase(lastWord)) {
                outputBuilder.append(word);
                outputBuilder.append(" ");
            }
            lastWord = word;
        }
        return outputBuilder.toString();

    }

    public static String removeConsecutiveDuplicates(String input) {

        if (input == null)
            return null;

        Stack<String> wordsStack = new Stack<>();
        String[] words = input.split(" ");
        for (int i = 0; i < words.length; i++) {
            wordsStack.push(words[i]);
        }

        String lastWord = "";
        String stringWithNoConsecutiveDuplicates = "";

        while (!wordsStack.isEmpty()) {
            String tempPop = wordsStack.pop();
            if (!lastWord.equalsIgnoreCase(tempPop)) {
                stringWithNoConsecutiveDuplicates = tempPop + " "
                        + stringWithNoConsecutiveDuplicates;
            }

            lastWord = tempPop;
        }

        return stringWithNoConsecutiveDuplicates;

    }

}

Note

Couple of points to note. Prefer removeConsecutiveDuplicatesBetter approach as it does not keep all data in memory. So lets say you have a huge file you need to do this operation on you read it line by line and then tokenize it to further process it.

Also note HashSet has add method which returns false if data is already present in the set.

Stack method is more if you want to retain the duplicate consecutive words which are farthest in a line. Irrespective of your approach you need to take care of following -

Don't keep all the data in the memory (data can be very huge). Using Stack may led to overflow.
Use minimum space and more efficient lookup. Hashset internally used HashMap to keep the data hence put/get is ~O(1).
Avoid String concatenation. Use StringBuilder instead.

As you must have already noticed above approaches do not take care of punctuations. You will need separate processing for that.

Friday, 13 February 2015

Understanding how hashing (HashMap and HashSet) work in Java

Background

Hashing is a very important concept in computer programming and a very popular concept for interview questions. Two important data structures related to hashing are HashMap and HashSet which is exactly what we are going to look at in this post.

Overview

If you are from a computer science background you would know HashMap is a data structure which stores key value pairs where as a HashSet stores unique data. In HashMap we have kind of buckets and each data added to a HashMap falls into one of the buckets depending on the hash value of it. Also you must have heard that adding and retrieving objects in HashMap happen in time complexity O(1).

But still there are open end question like -

What happens when two objects added to HashMap have same hash (code) value ? - a situation typically known as collision.
If above is handled how get and put work in hashmap?.... and so on.

We will address them now.

Understanding how HashMap works

You can visualize HashMap as follows-

So you have an Array and each array position is essentially a Linked List. As you know you don't have to specify size of the HashMap. It increases dynamically like ArrayList. The main data structure is essentially an array.

    /**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry[] table;

When you create a HashMap you either choose to provide initial capacity or when you don't a default value is used.

    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 16;

When table is created it is created with either this initial default capacity or the capacity you provide in the constructor.

Next important thing is when should out table we resized to meet the dynamically changing data size of HashMap. Answer depends on a threshold that is determined by load factor.

    /**

     * The load factor used when none specified in constructor.

     */

    static final float DEFAULT_LOAD_FACTOR = 0.75f;

HashMap provides a constructor to initialize load factor as well. So when your data size equals

threshold = table capacity * load factor

then the HashMap table is resized. Constructors are as follows -

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        // Find a power of 2 >= initialCapacity
        int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;

        this.loadFactor = loadFactor;
        threshold = (int)(capacity * loadFactor);
        table = new Entry[capacity];
        init();
    }

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    /**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
        table = new Entry[DEFAULT_INITIAL_CAPACITY];
        init();
    }

How is data with keys generating same hash code stored in HashMap?

As mentioned earlier each index has reference to object of type Entry which is a Linked List. It has a next pointer. So if an entry is added to hash map it's hash code is computed which determines the index at which it should be put. If that index has an entry then new entry is added to the start of the linked list and existing linked list is appended to next of it.

    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

    void addEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
        if (size++ >= threshold)
            resize(2 * table.length);
    }

Also note how null is handled. Yes null is an acceptable key in HashMap.

So how is data retrieved if two keys generate same hash code?

Same hash code will make searched for both keys data land on same index in the table. From there the each Entry object is iterated over and it's key compared with the search key. Yes both key and value are stored in the Node/Entry object! On successful match corresponding value is returned.

    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

How data is stored in HashMap

First of all the Node array size is always 2^N. Following method guarantees it -

static final int tableSizeFor(int cap) {

    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

So lets say you provide initial capacity as 5

cap = 5

n = cap - 1 =  4 = 0 1 0 0
n |= n >>> 1;    0 1 0 0 | 0 0 1 0 = 0 1 1 0 = 6
n |= n >>> 2;    0 0 1 1 | 0 1 1 0 = 0 1 1 1 = 7
n |= n >>> 4;    0 0 0 0 | 0 1 1 1 = 0 1 1 1 = 7
n |= n >>> 8;    0 0 0 0 | 0 1 1 1 = 0 1 1 1 = 7
n |= n >>> 16;   0 0 0 0 | 0 1 1 1 = 0 1 1 1 = 7
return n + 1     7 + 1 = 8

So table size is 8 = 2^3

Now possible index values you can put your element in map are 0-7 since table size is 8. Now lets look at put method. It looks for bucket index as follows -

Node<K,V> p = tab[i = (n - 1) & hash];

where n is the array size. So n = 8. It is same as saying

Node p = tab[i = hash % n];

So all we need to see now is how

hash % n == (n - 1) & hash

Lets again take an example. Lets say hash of a value is 10.

hash = 10
hash % n = 10 % 8 = 2
(n - 1) & hash = 7 & 10 = 0 1 1 1 & 1 0 1 0 = 0 0 1 0 = 2

So it's a optimized modulo operation with bitwise & operator.

A good hash function

A hash function is a method that computes hash of a key where data is stored. It should obviously return value between 0 - (n-1) for an array of length n used to store the data.

A good has function will have take minimum computation time will evenly distribute keys in the array.

For array of size n and m elements inserted it's load factory would ideally be

α = m/n

A hash function can be thought of two parts -

Hash code Map (Key -> Integer)
Compression Map (Integer -> [0,N-1])

Simple hash function (Compression Map) for an array of size N would be

h(k) = k mod n where k is the key and n is the size of the array.

NOTE : In above function h(k) you need to take care that m is not a power of 2. If you do you are only using last m bits of the number to compute hash in that case which is not a good method.

So choose m close to n and m should be prime.

Another compression map can be -

h(k) =lowerbound ( k A mod 1) where k is the key, m is the size of the array and A is a constant between 0 and 1 i.e 1<A<0

NOTE : For String avoid adding ascii values of characters as it is not a good function. Multiple words may result in same hash and map to same bucket resulting in higher collision. Use polynomial function instead. So if your integers of ascii chars are c0, c1, c2 use polynomial like -

C0 + C1(X) + C2(X^2)

String class in Java uses following has method -

    /**
     * Returns a hash code for this string. The hash code for a
     * <code>String</code> object is computed as
     * <blockquote><pre>
     * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
     * </pre></blockquote>
     * using <code>int</code> arithmetic, where <code>s[i]</code> is the
     * <i>i</i>th character of the string, <code>n</code> is the length of
     * the string, and <code>^</code> indicates exponentiation.
     * (The hash value of the empty string is zero.)
     *
     * @return  a hash code value for this object.
     */
    public int hashCode() {
    int h = hash;
        int len = count;
    if (h == 0 && len > 0) {
        int off = offset;
        char val[] = value;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }

Other hashing techniques are

Linear probing :

if (table is full error)
probe = h(k)
while(table(probe) is not empty)
probe = (probe + 1) mod m
table[prob] = k
double hashing :

if (table is full error)
probe = h1(k)
offset = h2(k)
while(table(probe) is not empty)
probe = (probe + offset) mod m
table[prob] = k

NOTE : In java if hashing is involved (lets say you are using HashMap or a HashSet) make sure you override equals() and hascode() method to suit your requirements. As much as is reasonably practical, the hashCode() method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer)

HashMap changes in Java8

The performance has been improved by using balanced trees instead of linked lists under specific circumstances. It has only been implemented in the classes

java.util.HashMap,
java.util.LinkedHashMap and
java.util.concurrent.ConcurrentHashMap.

This will improve the worst case performance from O(n) to O(log n).

Lastly lets see HashSet.

Understanding HashSet

Well if you are still guessing the data structure of HashSet then following would be a surprise -

    private transient HashMap<E,Object> map;

    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();

Yup HashSet stores data as keys of HashMap with dummy value. Constructor is equivalent to that of HashMap -

    /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * default initial capacity (16) and load factor (0.75).
     */
    public HashSet() {
    map = new HashMap<E,Object>();
    } 



    public HashSet(int initialCapacity, float loadFactor) {
    map = new HashMap<E,Object>(initialCapacity, loadFactor);
    }



    public HashSet(int initialCapacity) {
    map = new HashMap<E,Object>(initialCapacity);
    }

Add and Remove methods are as follows -

    public boolean add(E e) {
    return map.put(e, PRESENT)==null;
    }



    public boolean remove(Object o) {
    return map.remove(o)==PRESENT;
    }

Understanding LinkedHashMap

LinkedHashMap as we know stores the insertion order. However it also extends HashMap so it respects O(1) time complexity as well for insertion. So how does it really work.

LinkedHashMap maintains another type on Entry which holds pointer to previous as well as next Entry Node.

    static class Entry<K,V> extends HashMap.Node<K,V> {
        Entry<K,V> before, after;
        Entry(int hash, K key, V value, Node<K,V> next) {
            super(hash, key, value, next);
        }
    }

LinkedHashMap also stores head and tail of this double linked list and thats how it maintains the insertion order. So even though put follows hashing storage and retrieval order is maintained using doubly linked list.

    /**
     * The head (eldest) of the doubly linked list.
     */
    transient LinkedHashMap.Entry<K,V> head;
    /**
     * The tail (youngest) of the doubly linked list.
     */
    transient LinkedHashMap.Entry<K,V> tail;

So whenever you add a new key, value pair following happens -

    Node<K,V> newNode(int hash, K key, V value, Node<K,V> e) {
        LinkedHashMap.Entry<K,V> p =
            new LinkedHashMap.Entry<K,V>(hash, key, value, e);
        linkNodeLast(p);
        return p;
    }
    // link at the end of list
    private void linkNodeLast(LinkedHashMap.Entry<K,V> p) {
        LinkedHashMap.Entry<K,V> last = tail;
        tail = p;
        if (last == null)
            head = p;
        else {
            p.before = last;
            last.after = p;
        }
    }

When iterating it iterates from head to tail thereby providing same order as insertion.

NOTE : before and after pointers are in addition to the next pointer which is inherited from HashMap.Node class. So next points to next node having same hash (collision scenario) thereby preserving O(1) lookups. Before and After pointers guarantee insertion lookup order.

Understanding TreeMap

TreeMap as you already know stores the data in sorted order. If the data it stores implements Comparable interface then it stores data in that natural order or you can pass a custom comparator to the TreeMap and it will use that to sort and store the data.

TreeMap maintains a binary search tree structure. It has a root, value less than root go to left where as value greater than root go to right and it's balanced too (RBT - A red–black tree is a kind of self-balancing binary search tree).

    /**
     * The comparator used to maintain order in this tree map, or
     * null if it uses the natural ordering of its keys.
     *
     * @serial
     */
    private final Comparator<? super K> comparator;
    private transient Entry<K,V> root;

Simple code snippet would be -

        Entry<K,V> e = new Entry<>(key, value, parent);
        if (cmp < 0)
            parent.left = e;
        else
            parent.right = e;

where cmp is the comparison value got either from comparators compare() method or Comparables compareTo() method.